Libraries

rm(list = ls())
library(tidyverse)
library(httr2)
library(igraph)
library(visNetwork)
library(tidygraph)
library(ggraph)

Get the data

Data has been obtained following the procedure in the chunk below. The code is commented to avoid repeating the download every time the notebok is run.

# zip_data <- request("https://networks.skewed.de/net/product_space/files/SITC.csv.zip") |>
#   req_perform()
# 
# writeBin(resp_body_raw(zip_data), "exports_SITC.csv.zip")
# 
# unzip("exports_SITC.csv.zip", exdir = "network-data")
# 
# file.remove("exports_SITC.csv.zip", )
nodes <- read_csv("network-data/nodes.csv")
links <- read_csv("network-data/edges.csv")

Description of the dataset

Our network represents economic products. Two products are connected if two or more countries export both products in significant quantities (above world average). The meaning of a link is that two products are connected if the same countries “specialize” in making them, hence, basically, products are connected if they require the same capabilities to be made. Edges weights represent a similarity score (called “proximity”). Data is based on UN Comtrade worldwide trade patterns using the SITC (Standard International Trade Classification) for classifying product categories.

Source: The Product Space.

Properties:

Weighted, Undirected

Graph:

graph <- graph_from_data_frame(links, directed = FALSE, vertices = nodes)
graph
## IGRAPH cc18aaa UN-- 774 1779 -- 
## + attr: name (v/c), pid (v/n), community (v/n), size (v/n), pos (v/c),
## | leamer (v/n), color (v/c), _pos (v/c), width (e/n), color (e/c)
## + edges from cc18aaa (vertex names):
## [1] METAL FORMING MACHINE TOOLS                    --CONVERTERS,LADLES,INGOT MOULDS AND CASTING MACH.  
## [2] PRINTING PRESSES                               --OTHER MACH.-TOOLS FOR WORKING METAL OR MET.CARBIDE
## [3] OTHER FOOD PROCESSING MACHINERY AND PARTS      --PARTS OF THE MACHINERY OF 744.2-                  
## [4] PRODUCER GAS AND WATER GAS GENERATORS AND PARTS--OTHER PUMPS FOR LIQUIDS & LIQUID ELEVATORS        
## [5] PRODUCER GAS AND WATER GAS GENERATORS AND PARTS--CINEMATOGRAPHIC CAMERAS,PROJECTORS,SOUND-REC,PAR  
## + ... omitted several edges

Questions

2. What is the average degree in the network? And the standard deviation of the degree?

mean(degree(graph))
## [1] 4.596899
sd(degree(graph))
## [1] 5.994848

The average degree is 4.5969 in this network, with a standard deviation of 5.9948.

This means that on average every product is connected to 4/5 other products. Standard deviation seems to be high (higher than the mean), therefore indicating that there could be both products with a lot of edges and products with just one edge.

3. Plot the degree distribution in linear-linear scale and in log-log-scale. Does it have a typical connectivity? What is the degree of the most connected node?

# In linear-linear scale
ggplot() + 
  geom_histogram(aes(x = degree(graph)),
                 fill = "#69b3a2", color = "white", alpha = 0.8) + 
  labs(x = "Degree", 
       y = "", 
       title = "Degree distribution in linear-linear scale") +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.title = element_text(face = "bold")
  )

# In log-log scale
ggplot() +
  geom_histogram(aes(x = degree(graph)),
                 fill = "#69b3a2", color = "white", alpha = 0.8) +
  scale_x_log10() +
  scale_y_log10() +
  labs(
    x = "Degree (log scale)",
    y = "",
    title = "Degree Distribution in Log-Log Scale"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.title = element_text(face = "bold")
  )

We can observe that this network does not exhibit typical connectivity: Its degree distribution is highly skewed and lacks a clear peak. By being its degree distribution so broad there is not a typical connectivity for a node. Most nodes have a very low degree, while a few have very high degree. We observe a power-law-like distribution, rather than a Poisson-like distribution, where most nodes would have approximately the same number of links and no hubs. This is what can usually be observed in social networks, but our product network exhibits the same, indicating that some, but few products are exported together with many different products by many countries, while the majority of products are exported only by a few countries or by highly specialized countries.

max_degree(graph)
## [1] 43

The most connected node on our network has a degree of 43.

which.max(degree(graph))
## TRANSMISSION SHAFTS,CRANKS,BEARING HOUSINGS ETC. 
##                                              580

The node with the highest degree is that of mechanical components, indicating that there is no high specialization required to product this. These are indeed products that are probably going to be used to assemble more sophisticated goods and thus the demand for this product is high everywhere and its supply chain is likely to be very globalized with many countries producing them.

4. What is the clustering coefficient (transitivity) in the network?

transitivity(graph, type = "global")
## [1] 0.429691

The global transitivity of this network is 0.4297, which is closer to 0 than to 1, indicating a modest tendency of clustering. Less than half of the time, when two nodes share a common neighbor, they will also be directly connected to each other. This could indicate that not always countries tend to specialize on the same products.

5. What is the assortativity (degree) in the network?

assortativity_degree(graph)
## [1] 0.4571059

The assortativity coefficient of this network is 0.4571 (greater than 0), indicating a moderate to strong tendency for nodes to connect with others that have a similar degree. In other words, high-degree nodes tend to connect with other high-degree nodes, and low-degree nodes tend to connect with other low-degree nodes. This is a sign of assortative mixing.

6. Using the Louvain method, does the network have a community structure? If so, what is its modularity?

louvain_cluster <- cluster_louvain(graph, weights = E(graph)$width)

sizes(louvain_cluster)
## Community sizes
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
##  62 114  10  58 138  46  66  97  14   8  18   4  17  16   3   7  13  13  25   5 
##  21  22  23  24  25  26  27 
##   6   7   6   9   6   3   3
modularity(louvain_cluster)
## [1] 0.7446393

Yes, the network has a clear community structure. Using the Louvain method, the network was partitioned into more than 20 communities. The modularity value is >0.7, which is considered high and indicates a strong modular (community) structure within the network.

We can also try to visualize these communities:

layout <- layout_with_kk(graph)

# Using igraph only
plot(louvain_cluster, graph,
     layout = layout,
     vertex.size = 0.1,
     edge.width = 0.1,
     vertex.label = "")

# Using ggraph
graph_tbl <- as_tbl_graph(graph)
V(graph_tbl)$community <- membership(louvain_cluster)
ggraph(graph_tbl, layout = "kk") +
  geom_edge_link(width = 0.1, alpha = 0.7, color = "grey") +
  geom_node_point(aes(color = as.factor(community)), size = 1.5) +
  theme_void() +
  theme(legend.position = "none")

7. Test that the clustering coefficient in the network cannot be statistically explain by a configuration model in which the nodes have the same degree distribution as the original.

original_clustering <- transitivity(graph, type = "global")

Create 1000 random configuration models and register the clustering coefficient for each one of this generated model.

num_simulations <- 1000
# Initializing a vector
simulated_clustering <- numeric(num_simulations)
# Simulating the graphs and saving their transitivity
for (i in 1:num_simulations) {
  config_graph <- sample_degseq(degree(graph), method = "vl")  
  simulated_clustering[i] <- transitivity(config_graph, type = "global")
}

Here we use the method “vl” (Viger–Latapy) as it is the one suggested for undirected connected graphs.

Once we have generated the networks we want to proceed with evaluation and test of statistical difference.

original_clustering
## [1] 0.429691
mean(simulated_clustering)
## [1] 0.03410868
t.test(x = simulated_clustering, mu = original_clustering, alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  simulated_clustering
## t = -5078.4, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0.429691
## 95 percent confidence interval:
##  0.03395582 0.03426154
## sample estimates:
##  mean of x 
## 0.03410868

We tested whether the clustering coefficient of the original network can be explained solely by its degree distribution, by comparing it to 1000 configuration model networks with the same degree sequence. The original network’s clustering coefficient was 0.4297, while the average clustering coefficient from the configuration models was 0.034.

Therefore we can conclude that the clustering structure in the original network cannot be explained by degree distribution alone — it has significant non-random structure peculiar to the network.

8. Visualize the neighborhood of the node with the largest centrality (closeness)

which.max(closeness(graph))
## SLAG WOOL.ROCK WOOL AND SIMILAR MINERAL WOOLS 
##                                           453

We discovered that the node with the largest centrality/closeness is “SLAG WOOL.ROCK WOOL AND SIMILAR MINERAL WOOLS”. Apparently these are insulating materials coming from minerals.

head(neighbors(graph,"SLAG WOOL.ROCK WOOL AND SIMILAR MINERAL WOOLS"))
## + 6/774 vertices, named, from cc18aaa:
## [1] TRAILERS & SPECIALLY DESIGNED CONTAINERS          
## [2] PARTS OF THE MACHINERY OF 723.41 TO 723.46        
## [3] MATERIALS OF RUBBER(E.G.,PASTES.PLATES,SHEETS,ETC)
## [4] OTHER VEHICLES,NOT MECHANICALLY PROPELLED,PARTS   
## [5] PARTS OF THE MACHINERY OF 744.2-                  
## [6] MISCELLANEOUS ART.OF MATERIALS OF DIV.58

These are some of the neighbors of our node of interest.

We’ll now plot its neighbors.

neigh_graph <- make_neighborhood_graph(graph,
                                       order = 1, 
                                       "SLAG WOOL.ROCK WOOL AND SIMILAR MINERAL WOOLS")[[1]]

g <- as_tbl_graph(neigh_graph)

ggraph(g, layout = 'fr') +
  geom_edge_link(alpha = 0.5, width = 1) +
  # Colouring of a different colour our central node
  geom_node_point(size = 5, 
                  aes(color = name == "SLAG WOOL.ROCK WOOL AND SIMILAR MINERAL WOOLS")) +
  geom_node_text(aes(label = name), 
                 repel = TRUE, 
                 size = 3, 
                 color = "steelblue", 
                 check_overlap = TRUE) +
  guides(color = "none") +    
  theme_void()+
  labs(title = "Neighborhood of Node with Highest Closeness Centrality")

This plot looks a little bit messy, as the labels are quite long and difficult to read. However, if we hide all the labels, we lose a lot of valuable information. Therefore, we decided to use the visNetwork package to create an interactive graph, which allows us to freely drag nodes around and better explore the names of each node:

library(visNetwork)
nodes <- data.frame(id = V(g)$name, label = V(g)$name)
edges <- data.frame(from = as.character(ends(g, E(g))[,1]), to = as.character(ends(g, E(g))[,2]))

visNetwork(nodes, edges) %>%
  visEdges(color = list(color = "gray", hover = "red")) %>%
  visNodes(size = 15, color = list(background = "lightblue", border = "darkblue")) %>%
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
  visLayout(randomSeed = 123)

NOTE: As these is a .pdf document we are only able to show a screeenshot of the graph